A De Novo Whole Genome Assembly and Annotation of Parelaphostrongylus tenuis

Abstract Parelaphostrongylus tenuis causes ungulate morbidity and mortality in eastern and central North America, but no reference genome sequence exists to facilitate research. Here, we present a P. tenuis genome assembly and annotation, generated with PacBio and Illumina technologies. The assembly is 491 Mbp, with 7285 scaffolds and 185 kb N50.


Announcement
Climate change is predicted to increase the geographic range of Protostrongylid nematodes, which cause morbidity and mortality in many wild and domestic ungulate species (Carreno and Hoberg, 1999;Kutz et al., 2005).In North America, Parelaphostrongylus tenuis has already expanded its range northward in the last half-century, affecting the persistence and management of wildlife and domestic species (Lankester, 2001;Pickles et al., 2013).P. tenuis is a driver of moose population decline in the north-central United States and south-central Canada (Lankester, 2010;Carstensen et al., 2017) and impedes translocations and reintroductions of caribou (Vors and Boyce, 2009), mule deer (Oates et al., 2000), and elk (Samuel et al., 1992).It also causes neurological symptoms and mortality in a variety of domestic species (Keane et al., 2022).Currently, there is no publicly available reference genome sequence for P. tenuis, or any other member of the Protostrongylidae family.Hence, the generation of a P. tenuis reference genome and annotation is a significant advance in the molecular study of both P. tenuis and Protostrongylids.Here, we present a highquality de novo genome assembly and annotation of P. tenuis.This sequence will aid wildlife conservation and domestic animal husbandry by facilitating future studies of Protostrongylid transmission and evolution.
On October 23, 2019, we extracted two adult P. tenuis nematodes using methods described in Slomke et al. (1995) from a single hunter-harvested white-tailed deer doe head near Rochester, Minnesota.We determined the nematodes to be female based on their large size relative to males (Lankester, 2001).We then flash-froze the specimens in nitrogen and stored them at -80°C until DNA extraction two weeks later.We combined the two worms for a single DNA extraction, which the University of Minnesota Genomics Center (UMGC; St. Paul, MN) performed using the Gentra Puregene (Qiagen, Hilden, Germany) Tissue kit.UMGC then used the Genomic DNA ScreenTape System and Tapestation (Agilent Technologies, Santa Clara, CA) software to confirm sufficient DNA mass and quality for downstream applications.The DNA yield from the nematodes was 3.86 µg, with absorbance ratios of 1.4 at 260/230 nm and 1.85 at 260/280 nm.The DNA integrity number (scale of 1 to 10, with 1 being highly degraded and 10 being highly intact) was 8.9, and 93.11% of the fragments were between 12,198 and >60,000 bp.UMGC performed library preparation on this DNA using the PacBio SMRTbell Express Template Prep Kit 2.0 (Pacific Bioscience of California Inc., Menlo Park, CA) and carried out sequencing on a PacBio Sequel using 3 1M v3 SMRT cells.
We used PacBio SMRT ® Tools to create circular consensus sequences from the raw reads and perform quality filtering.We removed reads that had a minimum predicted accuracy of lower than 80%, a consensus read length of below 100 bp, or a consensus read length of above 745,000 bp.Among the approximately 1.4M reads that passed filters, the average Phred quality score was 8, and the mean length was 8,000 bp.We de novo assembled consensus reads with Flye (v2.7;Kolmogorov et al., 2019).Using QUAST (Gurevich et al., 2013), we generated quality statistics for our assembly.Our final assembly was 491 Mbp, with a coverage of 23X.Our assembly contained 7,285 scaffolds and had an N50 of 185 kbp (Table 1).Based on these statistics, P. tenuis has one of the larger genomes in the order Strongylidae (Table 2).Our assembly quality is also in the top half of publicly available genomes in Strongylidae based on N50 and number of scaffolds.
To identify and annotate repetitive elements in the genome assembly, we built a custom repeat library from our assembly with RepeatModeler (v2.0.1;Smit and Hubley, 2019).We used RepeatMasker (v4.0.5;Smit et al. 2015) to combine this custom library with a standard RepeatMasker library and then ran the program in sensitive mode to find repeats in our assembly.The assembly contained 7.17% repetitive content (160,749 repeat elements, 35,213,890 bp), comprised largely of long interspersed nuclear elements (LINEs; 31,166 elements, 4.5% of genome sequence), DNA transposons (42,041 elements, 1.12% of genome sequence), long terminal repeats (LTRs; 11,592 elements, 0.16% of genome sequence), and short interspersed nuclear elements (SINEs; 3590 elements, 0.05% of genome sequence).These repetitive elements were labeled and masked with RepeatMasker so as not to interfere with later annotation steps.This amount of repeat content is low relative to other large nematode genomes, and it is possible that undermasking explains the large number of protein-coding genes predicted.

JOURNAL OF NEMATOLOGY
We used RNA sequencing data to inform the gene annotation process.The RNA libraries for this step were prepared from a single, whole, adult, female P. tenuis worm from hunter-harvested white-tailed deer from Oak Ridge, Tennessee.The nematode was collected and stored in RNAlater (Thermo Fisher Scientific Inc, Waltham, MA) at -20° C. RNA was enriched using the MasterPure (Illumina Inc, San Diego, CA) RNA Purification kit and associated protocol.After RNA extraction and purification, a transcriptomic library was prepared using the Illumina Tru-seq RNAseq protocol.RNA was converted into cDNA using RT-PCR.Sequencing was performed with an Illumina MiSeq (Illumina Inc, San Diego, CA) at the University of Tennessee Genomics Core (Knoxville, TN).Purified RNA was loaded at 6 picomolar with 5% 6 picomolar phiX as a control on a version 3 flow cell reading 250 bases, paired end.The 20,116,257 RNA-seq reads that passed quality control were trimmed with Trimmomatric (v0.39, Bolger et al., 2014) (settings used: ILLUMINACLIP:${ADAPTERS}:4:15:7:2:true LEADING:0 TRAILING:0 SLIDINGWINDOW:4:15 MINLEN:75) and then aligned to the P. tenuis whole genome assembly with the splice read aligner STAR (v2.7.1a, Dobin et al., 2013) using the following settings: --alignIntronMin 10 --alignIntronMax 10000 --outFilterMultimapNmax 20.We had 16,594,862 (82.5%) reads that mapped uniquely to the genome, 1,022,828 (5.0%) reads that mapped to multiple locations, and 2,471,022 (12.3%) that were too short to map to the assembly.
The funannotate pipeline predicted 38,371 gene models and identified 29,657 protein-coding genes (Table 1).The average gene length was 4,088 bp, with a maximum length of 128,078 bp.Using the predicted protein sequences from gene models, we assessed the proteome completeness with BUSCO and the nematode_odb10 lineage dataset.We found 71% (2,224) of the BUSCO proteins being complete and 7.1% (223) of the BUSCO proteins fragmented in our annotation.We did not detect 21.9% (684) of the BUSCO proteins.
We anticipate this de novo genome assembly will facilitate a broad range of studies aimed at investigating the evolution and biology of P. tenuis and other Protostrongylids.For example, our team has leveraged the assembly as a reference for reducedrepresentation methods facilitating population-level insights into the transmission of P. tenuis.Additionally, the annotation opens the door for genome-wide association studies, which may identify a genetic basis for pathogenicity in brainworm.This information might also be used to design vaccines or treatments to reduce morbidity and mortality in moose and other aberrant hosts.

Table 1 :
Quality measures and descriptive statistics for our genome assembly and annotation of Parelaphostrongylus tenuis.

Table 2 :
A comparison of descriptive statistics of the Stronglyid family genomes available in WormBase ParaSite.